Master’s Thesis: A Tuning Approach Based on Evolutionary Algorithm and Data Sampling for Boosting Performance of MapReduce Programs

نویسندگان

Tiago R. Kepe

Eduardo Cunha de Almeida

چکیده

The Apache Hadoop data processing software is immersed in a complex environment composed of huge machine clusters, large data sets, and several processing jobs. Managing a Hadoop environment is time consuming, toilsome and requires expert users. Thus, lack of knowledge may entail misconfigurations degrading the cluster performance. Indeed, users spend a lot of time tuning the system instead of focusing on data analysis. To address misconfiguration issues we propose a solution implemented on top of Hadoop. The goal is presenting a self-tuning mechanism for Hadoop jobs on Big Data environments. To achieve this, our self-tuning mechanism is inspired by two key ideas: (1) an evolutionary algorithm to generate and test new job configurations, and (2) data sampling to reduce the cost of the self-tuning process. From these ideas we created a framework for testing usual job configurations and get a new configuration suitable to the current state of the environment. Experimental results show gains in job performance against the Hadoop’s default configuration and the rules of thumb. Besides, the experiments prove the accuracy of our solution which is the relation between the cost to obtain a better configuration and the quality of the configuration reached.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Fraud Detection of Credit Cards Using Neuro-fuzzy Approach Based on TLBO and PSO Algorithms

The aim of this paper is to detect bank credit cards related frauds. The large amount of data and their similarity lead to a time consuming and low accurate separation of healthy and unhealthy samples behavior, by using traditional classifications. Therefore in this study, the Adaptive Neuro-Fuzzy Inference System (ANFIS) is used in order to reach a more efficient and accurate algorithm. By com...

متن کامل

GENERALIZED FLEXIBILITY-BASED MODEL UPDATING APPROACH VIA DEMOCRATIC PARTICLE SWARM OPTIMIZATION ALGORITHM FOR STRUCTURAL DAMAGE PROGNOSIS

This paper presents a new model updating approach for structural damage localization and quantification. Based on the Modal Assurance Criterion (MAC), a new damage-sensitive cost function is introduced by employing the main diagonal and anti-diagonal members of the calculated Generalized Flexibility Matrix (GFM) for the monitored structure and its analytical model. Then, ...

متن کامل

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Master’s Thesis: A Tuning Approach Based on Evolutionary Algorithm and Data Sampling for Boosting Performance of MapReduce Programs

نویسندگان

چکیده

منابع مشابه

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Fraud Detection of Credit Cards Using Neuro-fuzzy Approach Based on TLBO and PSO Algorithms

GENERALIZED FLEXIBILITY-BASED MODEL UPDATING APPROACH VIA DEMOCRATIC PARTICLE SWARM OPTIMIZATION ALGORITHM FOR STRUCTURAL DAMAGE PROGNOSIS

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

عنوان ژورنال:

اشتراک گذاری